# vLLM Optimization
Qwq 32B INT8 W8A8
Apache-2.0
INT8 quantized version of QWQ-32B, optimized by reducing the bit-width of weights and activations
Large Language Model
Transformers English

Q
ospatch
590
4
Whisper Large V3.w4a16
Apache-2.0
This is the quantized version of openai/whisper-large-v3, employing INT4 weight quantization and FP16 activation quantization, suitable for vLLM inference.
Speech Recognition
Transformers English

W
nm-testing
20
1
Qwen2.5 VL 3B Instruct Quantized.w8a8
Apache-2.0
Quantized version of Qwen/Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, with weights quantized to INT8 and activations quantized to INT8.
Image-to-Text
Transformers English

Q
RedHatAI
274
1
Pixtral 12b FP8 Dynamic
Apache-2.0
pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.
Text-to-Image
Safetensors Supports Multiple Languages
P
RedHatAI
87.31k
9
Deepseek Coder V2 Lite Instruct FP8
Other
FP8 quantized version of DeepSeek-Coder-V2-Lite-Instruct, suitable for commercial and research use in English, optimized for inference efficiency.
Large Language Model
Transformers

D
RedHatAI
11.29k
7
Featured Recommended AI Models